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i . it.,, t 

14 51 GGGCCGCCA7 C wGCGCCrTG 77CC7GGGC7 7C77GGGGGC GGCGGGCAGC 
1501 ACCA7GGCCC CCGCCAGCG7 GACCCTGACC G7GCAGGCCC ' CCCTGCTCCT ' 
1551 GAGCGGCA7C ^TGCAGCAGC AGAACAACC7 CC7CCGCGCC A7CGAGGCCC 
AGCACCA7AT QC7CCAGC7C ACCG7G7GGG GCA7CAAGCA CC7CGAGGCC 

:==: zzzzrz~~zz c:zrzz\zcz ctacctsXag gaccagcacc "ctccc™ 

ztgggcctgc t:c=gcaagc toatctgcac caccacggta ccctggaacg 1 

1"S1 CC7CC73GAG CAACAAGAGv C73GACGACA 7CT3GAACAA CA7GACC73G 

13:i A7CCAG7GCG A,3CGC3AGA7 7GA7AAC7AG ACCAGCC73A 7C7ACAGC77 

l=5i GC7GGAGAAG A3CCAGACCC AGCAG3AGAA GAACGAGCAG GAGCT3C7GG 

13 CI AGCTGGACAA CrGGGCGAGC C7S7GGAAC7 3G77C5ACAT CACCAAC73G 

135: rrrrrsGTACA rr.vAAATrrr zatcatgatt gtgggcggcg tggtgggcgt 



:::: "gca7CCTg ttzzzzz—z t^agcatcgt -aaccgcgtg cgccagggct 

i 

::s: acagccgcct g*gcctccag acccsgcgcc ccgtgcggcg cgggcccgac 
cgccctgacg gcatcgagga sgagggcggc gagcgcgacc gcgacaccag 

Zlzl TGGCACGCTC 57CCACGGC7 7CC7GGCGA7 CA7C7GGG7C GACC7CCGCA 

:::: ccctcttcct ^ttcagctac caccaccgcg acgtgctgct gatcgccgcc 

225 : C3CA7CG7CG AAC7C77AGG CGGCCGCGGC 7GGGACG7GC 7CAAG7AC7G 

13 Ci C-7GGAACG7C C7CCAG7A77 GGAGCCAGGA GC7GAAG7CC AGCGCCG7CA 

23 5 1 3C"3C7GAA CGCCACCGCC A7CGCCG7GG CCGAGCGCAC CGACCGC37G 

1 A7CCAGG7GC TCCAGAGGGC C3GGAGGGC3 A7CC7GCACA 7CCGCACrZG 

:;s: :atccgccag ;ggc7cgaga 3ggcgc7cc7 3 CSeQ. 15 k)02s*> 



^6 . 1 



BNSDOCID: <WO 9609378A1 t > 



I 

I 



WO 96/09378 . 1 

1 

' 3/12- '» , ' 



PCT/US95/11511 



■ 



1 A^CUAGAAGG ™G»GAG C37GTACTAC siW^g T^KAAGOA 
Si GGGCAGGAGC AIGCTCTTCT C^CGAAGGCG 7AGGACACC3 

::: ajgtggacaa c;Tcr=cccr AccrAGGc-r gggtgcggag ggakccaac 

-5- CCGCAGGAGG T-AGCTrGT SAACSTtsirc GAGAACTTCA ACATSTGGAA I 
i:i GAACAACA7G C7G3AGCAGA 7"A7~AGGA CA7CA7CAGC "COGGA" 

:s: agaggg7gaa <*zzt--rs-z aagctsaccc rrrr^TGCGT gaccctcaac 

TGCACCGAr- T r?AGGAACAC :aCGAACACG AACAACACCA CCSCCV.CAA 
GAACARCAAC A3C3AGGGGA CCATCAAGGG C3GGCAGA7G AAwAACTGCA 



55; 




•~..GTACA AGC73GATAT C37GAGCA7C CACAACSACA ^CACGAGGTA 
,rr~GCAACA 33AGGG7GA7 CACJ-A^GCC 7GCCGGAAGA 
qCrCATGGGC ATZ'JACrACT GGGGGCCGGG CGGGTTCGCC 
s:: ATCrr-AAC? ^CAAGGAC.V. jaagttcagg GGCAAGGGCA GCTGCAAGAA 

= i CwTCAccAcr tcuag.gca c==aggggat cgggcgggtg gtgaccac— 

"CI A3C7CC7GC7 ^AA^GGAGG 37GGCGGAGG AGGAGGTGGT GATrr^CAGC 
- 1 --AACTI'.A CCGACAACGC ZAAGACCATC A7CG7GCACC TCAATGAGAO 
3 SI JGIGCAGATG AACTGGAGGC 37CCCAAC7A CAACAAGCGG AACCGCATCC 
C3GGC3GGCG 77G7ACACGA CCAACAACA7 CA7C3GGACC 
:AT— CTAGA 3GGAAG73GA ACGAGACCG7 
SCGCCAGATr C^TGAGCAAJV? 73AAGGAGCA 377CAAGAAC AAGAGGATCG 
7377 GAAC-TA OAv-CAGCCGC 33G3AGCGG3 A3A7C373AT ZZA 
AAC7"™3 <ij3GAA7TG77 37AG7GCAAG AGCAGCZGGC 73: 



551 
? 51 



: = ; 



TACGTGGAAC (JGCAACAACA "7GGAACAA CACGACMCG AGGAAGAACA 
wATTACCrT CCAGTGGAAG A73AAGGAGA ; -A7GAACA7 37GGGAGGAG 
201 G7GGGCAAGG CCA7G7ACGC ZZZZZZZXZZ ^AGGGCCAGA 7CC3G73CAG 
2S1 3AGGAAGATC ACCGGTCT T3G73ACCG3 33AC3GC3GC AAWACACC3 

::: acacc.v.-ja cacggaaatg 7733333—3 gcgoc-^-.-.a catgggggac 

35i AAAT73G AG A7 C73AGG737A 3AAG7ACAA.-, -7GG7CAC3A TC3AGCCCC7 
1:1 ZZZZZ7ZZZZ CGCACCAAGG TAAnmcCG 337GG7GGAG G3CGAGAACC 
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,1S01 OGCGCCGACA TGCGCGACAA C7CGACA7CT GAGCTGTACA AG7ACAAGG? 

GS7GACGATC ^GCCCCTGG CCGTGCCCCC CACCAAGGCC AAGC=C==C= 
TGGTGCAGCG C2AGAAGC5C 7AAAGCGCCC =C ( S , 0 IOO ^) 



FIG I 
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1 C7CGAGATCC AT7G7GC7C7 AAAGGAGA7A CC7SGCCAGA CACCC7CACC 
, 51 73CGG7GCCC A«3C7GCCCAG CC7GAGGCAA GAGAAGGCCA GAAACCA7GC 
i:i C3A7GGGG7C 7773CAACC3 C73GCCACC7 737ACC75C7 3GCGA7CC7G 
S"=5=~CC3 T3C7AGCCAC C3AGAAGC73 7GGG7GACCG 737AC7ACGG 
- 201 C37GCCC37G 73GAAGGAGG CCACCACCAC CC7G77C73C 3CCAGCGACG 
251 C3AAGGCG7A C3ACAC33AG G7GCACAAC3 737GGGCCAC CCAGGCG7GC 
2 31 37GCCCACCG ACCCCAACCC CCAGGAGC7G GAGC7C373A AC573ACC5A 
2 51 3AAC77CAAC A7G73GAAGA ACAACA7GG7 5GAGCAGA7G CA7GAGCACA 
4 31 7CA7CA3C37 C7GGGACCA3 AGCC73AAGC CC73C37GAA GC7CACCCCC 
4 51 C7S73C373A C =77 3AAC73 CACC3ACC73 AGGAACACCA CCAACACCAA 
SCI CAACAGCACC VCAACAACA ACAGCAACAG C3ACGGCACC A7CAAGGGCG 
551 3CGAGA7GAA CAAC7GCAGC 77CAACA7CA C3ACCAGCA7 CCGCGACAAG 
601 A7GCAGAAGG. A37ACGCCC7 GC7S7ACAAG C73GATA7C 3 73AGGATCGA 
£=1 CAACGACACC ACCACC7ACC GCC7GA7C7C C7GCAACACC AGCG7GA7CA 
7:1 CCCAGGCC73 QCCCAACA7C AGCT7CCAGC CGA7CCCCAT CCAC7ACTGC ' 
"51 3C7CC33CC3 ^C77C3CCA7 CC7CAAG73C AAC3ACAAGA AG77CA6CGG 
301 CAAGGGCAGC T3CAAGAACG 73AGCACC37 5CAG7GCACC CACGGCA7CC 
3:1 33CC3G7GG7 G^AGCACCCAG C7CC73C7GA ACGGCAGCC7 3GCC3AGGAG 
9C1 3AGG7GG7CA TCCSCAGCCA GAAC77CACC 3ACAAC3CCA AGACCA7CA7 
951 C37GCACC7G AA7GAGAGCG 73CAGATCAA C7GCACGCG7 CCCAACTACA 
10 CI ACAA3C3CAA <*:GCA7CCAC A7C3GCCCC3 3GC3C3CCT7 C7ACACCACC 
1C51 AAGAACA7CA TCGGCACCAT C33CCAGGCC CACTGCAACA 7CTC7AGAGC 
1101 CAAGTGGAAC CACACCC73C GCCAGA7CG7 3AGCAAGC73 AAGGAGCAGT 
1151 7CAAGAACAA CACCA7CG7G 77CAACCAGA GCAGCGGCGG CGACCCCGAC 
1201 A7CG73ATGC ACAGC77CAA C7GCGGCGGC GAAT7C77C7 AC7GCAACAC 
12 SI CAGCCC33TS TTCAACAGCA CC7GGAACGG CAACAACACC 73GAACAACA 
1301 CCACC33CAC CAACAACAAT A7TACCZ7CC AG7GCAAGA7 CAAGCAGATC 
13S1 A7CAACA7G7 GGCAGGAGG7 GGGCAAGCCC A7C7AC5CCC CCCCCA7CGA 
1401 GGGCCAGATC CGG7GCACCA CCAACATCAC CGG7C7CCTC C7GACCCGCG ^ ' 

1451 ACGGCGGCAA CGACACCGAC ACCAACGACA CCGAAA7C77 CCGCCCCGGC ( 5H6ST 1 Of i| ) 
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( 16. A m thod' for preparing a synthetic gene ' 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 1 
5 replacing one or more of said non-preferred and less-* 

preferred codons with a preferred codon encoding the same 
amino acid as the replaced codons 
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I 

7. The synthetic g ne of claim l wherein at least 
10% of the codons in said natural gene are non-preferred 
codons. ' 

8. The synthetic 1 gene of claim 1 wherein at least 
5 50% of the codons in said natural gene are non-preferred 

' codons • , i 

9. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said 1 natural gene have been replaced by 

10 preferred codons. 

i ' • 
' i 

10. The synthetic gene of claim 1 wherein at 
least 90% of the non-preferred codons and less preferred 
codons present in said natural gene have been replaced by 
preferred codons. 

15 11. The synthetic gene of claim 1 wherein said 

protein is a retroviral or lentiviral protein. 

12. The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

13 . The synthetic gene of claim 12 wherein said 
20 protein is selected from the group consisting of gag, 

pol, and env. 

14. The synthetic gene of claim 13 wherein said 
protein is gpl20 or gp!60. 

15. The synthetic gene of claim 1 wherein said 
25 protein is a human prot in. 
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i. A synthetic gene encoding ,a. prot in n rwillv 

preferred or less preferred codon in the natural gen. 
.ncodin, ..id -.^.iien proteln h „ ^ r . pl J*/™, 
S preferred codon encoding the sa»e a«ino acid. 

svnthetiV Tl>e ' yntt,e " c 96M ° f 1 wherein said 

synthetic gene is capable of expressing said B . B »ali. n 
protein at a level which is at least 110% of that 
expressed by said natural gene in an i„ vitro -amalian 
10 cell culture syste. under identical conditions. 

3. The synthetic gene of claim 1 wherein said 

IITT * en * ^ CaPable ° f messing said mammalian 
protein at a level which is at least 150% of that 
expressed by said natural gene in an In y^o. cell 
culture system under identical conditions. 



15 



4 . The synthetic gene of claim 1 wherein said 

protein 10 *. 9 "? " ° f e3 *" Ssin * -mmalian 

protein at a level which is at least 2 00% of that 

expressed by said natural gene in an in ^ cell 

20 culture system under identical conditions. 



5. 



The synthetic gene of claim l wherein said 



synthetic gene is ca pable_of-expr-essing-said-mammal-ian- 

protein at a level which is at least 500% of that 
xpressed by said natural gene in an in y^ cell 
25 culture system under identical conditions. 

6. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at 1 ast ten times that 
expressed by said natural g n in an In yi£ro cell 
30 culture system under identical conditions. 
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(2) INFORMATION FOR, SEQ ID N s36t' 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTHS 486 base pairs 

(B) TYPE: i nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear i i 



<xi) SEQUENCE DESCRIPTION : SEQ ID NO: 36: ' 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAG TAGAGGACAA 60 

i 1 i ' - 

AGAGTAATAA GTTTAACAGC ATGTTTAGTA AATCAAXATT TGAGATTAGA TTGTAGACAT 120 

GAAAATAATA CACCTTTGCC AATACAACAT GAATTTTCAT TAACGCGTGA AAAAAAAAAA 180 

CATGTATTAA GTGGAACATT AGGAGTACCA GAACATACAT ATAGAAGTAG AGTAAATTTG 240 

TTTAGTGATA GATTCATAAA AGTATTAACA TTAGCAAATT TTACAACAAA AGATGAAGGA 300 

GATTATATGT CTGAGCTCAG AGTAAGTGGA CAAAATCCAA CAACTAGTAA TAAAACAATA 360 

AATGTAATAA GAGATAAATT AGTAAAATGT pGAGGAATAA GTTTATTAGT ACAAAATACA 420 

AGTTGGTTAT TATTATTATT ATTAACTTTA ACTTTTTTAC AAGCAACAGA TTTTATAAGT 480 

TTATGA 486 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION s SEQ ID NO: 37: 



ATGAACCCAG 


TCATCAGCAT 


CACTCTCCTG 


CTTTCAGTCT 


TGCAGATGTC 


CCGAGGACAG 


60 


AGGGTGATCA 


GCCTGACAGC 


CTGCCTGGTG 


AACAGAACCT 


TCGACTGGAC 


TGCCGTCATG 


120 


AGAATAACAC 


CAACTTGCCC 


ATCCAGCATG 


AGTTCAGCCT 


GACCCGAGAG 


AAGAAGAAGC 


180 


ACGTCCTGTC 


AGGCACCCTG 


GGGGTTCCCG 


AGCACACTTA 


CCGCTCCCGC 


GTCAACCTTT 


240 


TCAGTGACCG 


CTTTATCAAG 


GTCCTTACTC 


TAGCCAACTT 


GACCACCAAG 


GATGAGGGCG 


300 


ACTACATGTG 


TGAACTTCGA 


GTCTCGGGCC 


AGAATCCCAC 


AAGCTCCAAT 


AAAACTATCA 


360 


ATGTGATCAG 


AG ACAAG CTG 


GTCAAGTGTG 


GTGGCATAAG 


CCTGCTGGTT 


CAAAACACTT 


420 


CCTGGCTGCT 


GCTGCTCCTG 


CTTTCCCTCT 


CCTTCCTCCA 


AGCCACGGAC 


TTCATTTCTC 


480 


TGTGA 












485 



What is claimed is: 
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CTCCACTGCA CCCACCCCAT CCCCCCCCTG CTGACCACCC ACCTCCTGCT CAACCCCACC 
CTGGCCCAGG AGCAGCTGCT CATCCGCACC GACAACTTCA CCGACAACCC CAAGACCATC 
ATCGTGCACC TCAATCACAO CCTGCAGATC AACTCCACGC GTCCCAACTA CAACAAOCCC 
AAGCGCATCC ACATCGOCCC CGCGCOCGCC TTCTACACCA CCAAGAACAT CATCCGCACC 
ATCCGCCAGC CCCACTGCAA CATCTCTAGA GCCAAGTGGA ACCACACCCT GCGCCAGATC 
OTGAGCAAGC TGAAGGAGCA GTTCAAGAAC AAGACCATCG TGTTCAACCA GAGCACCGGC 
GGCGACCCCG ACATCGTGAT GCACACCTTC AACTCCCCCG GCGAATTCTT CIACTGCAAC 
ACCACCCCCC TGTTCAACAG CACCTCGAAC CCCAACAACA CCTGGAACAA CACCACCGGC 
AGCAACAACA ATATTACCCT CCAGTGCAAG ATCAAGCAGA TCATCAACAT GTGCCAGGAG 
GTGGCCAAGC CCATGTACGC CCCCCCCATC GAGGGCCAGA TCCGGTGCAG CAGCAACATC 
ACCCGTCTCC TGCTCACCCG CGACGCCCGC AAGGACACCG ACACCAACCA CACCGAAATC 
TTCCGCCCCG CCGGCGCCGA CATCCGCGAC AACTCCAGAT CTGAGCTGTA CAAGTACAAG 
GTGGTGACGA TCGAGCCCCT GGGCGTGGCC CCCACCAAGG CCAACCGCCG CGTGGTGCAG 
CCCCAGAAGC GCCCCCCCAT CGGCCCCCTG TTCCTCCGCT TCCTGGCGCC GGCGGGCAGC 
ACCATGCCGG CCGCCACCCT GACCCTGACC CTGCACGCCC CCCTCCTCCT CACCGGCATC 
GTCCAGCAGC AGAACAACCT CCTCCGCCCC ATCCAGCCCC AGCAGCATAT GCTCCAGCTC 
ACCCTGTGCG CCATCAAOCA GCTCCAGCCC CCCCTCCTGC CCGTGGAGOG CTACCTGAAG 
CACCACCAGC TCCTGGGCTT CTGCGCCTCC TCCGCCAAGC TGATCTGCAC CACCACGGTA 
CCCTGGAACG CCTCCTGGAO CAACAAGAGC CTGCACCACA TCTGGAACAA CATCACCTCO 
ATGCAGTGGG AGCGCGAGAT CGATAACTAC ACCAGCCTCA TCTACAOCCT GCTGCAGAAG 
AGCCAGACCC AGCAGGAGAA GAACCAGCAG CAGCTCCTGC AG CTGG ACAA CTGGGCGAGC 
CTCTCGAACT GGTTCGACAT CACCAACTCC CTGTGGTACA TCAAAATCTT CATCATCATT 
GTGGCCGCCC TGOTGGGCCT CCCCATCGTG TTCGCCCTGC TCACCATCCT GAACCGCGTG 



720 
760 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 

t 

1320 
1380 
1440 
1500 
1560 

1620 

1680 

1740 

1800 ' 

I860 

1920 

1980 
2040 



,ceec*c<^^ cGccccccAc 

CCCCCCGACC CCATCGACCA CGACGCCCGC CACCCCCACC CCCACACCAC CCGCAGCCTC 2160 

CTCCACCCCT TCCTCCCCAT CATCTCGGTC GACCTCCGCA CCCTGTTCCT GTTCAGCTAC 2220 

CACCACCCCG ACCTCCTCCT GATCCCCGCC CCCATCGTCC AACTCCTACG CCCCCCCGGC 2280 

TGCCAGCTGC TCAACTACTC GTGGAACCTC CTCCACTATT CCACCCACGA CCTGAAGTCC 2340 

ACOCCCGTCA CCCTCCTCAA CGCCACCCCC ATCGCCGTGG CCCAGGGCAC CCACCCCGTG 2400 

ATCCAGGTGC TCCACACGCC CGCCACCCCC ATCCTCCACA TCCCCACCCG CATCCGCGAC 2460 
CGGCTCGAGA GGGCGCTGCT G 

2481 
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CACGGCATCC 


GCCCGGTGGT 


GAGCACCCAG CTCCTGCTGA XCGGCAGCCT GGCGGAGGAG 


900 


GAGGTGGTGA 


TCCGCAGCGA 


GAACTTCACC 


GACAACGCCA 


AGACCATGAT 


CGTGCACCTG 


960 


AATGAGAGCG 


TGCAGATCAA 


CTGCACGCGT 


CCCAACTACA 


ACAAGCCGAA 


GCGCATCCAC 


1020 


ATCGGCCCCG 


GGCGCGCCTT 


CTACACCACC 


AAGAACATCA 


TCGGCACCAT 


CCGCCAGGCC 


1080 


CACTGCAACA 


TCTCTAGAGC 


CAAGTGGAAC GACACCCTCC GCCAGATCGT 


GAGGAAGCTG 


1140 


AAGGACCAGT 


TCAAGAACAA 


GACCATCGTG 


TTCAACCAGA GCAGCGGCGG 


CGACCCCGAG 


1200 


ATCGTGATGC 


ACAGCTTCAA 


CTGCGGCGGC 


GAATTCTTCT 


ACTGCAACAC 


CAGCCCCCTG 


1260 


TTCAACAGCA 


CCTGGAACGG 


CAACAACACC 


TGGAACAACA 


CCACCGGGAG 


GAACAACAAT 


1320 


ATTACCCTCC 


AGTGCAAGAT 


CAAGCAGATC 


ATCAACATGT 


GGCAGGAGGT 


GGGCAAGGCC 


1380 


ATGTACGCCC 


CCCCCATCGA 


GGGCCAGATC 


CGGTGCAGCA 


GCAACATCAC 


CGGTCTGCTG 


1440 


CTGACCCGCG 


ACGGCGGCAA 


GGACACCGAC 


ACCAACGACA 


CCCAAATCTT 


CCGCCCCGGC 


1500 


GGCGGCGACA 


TGCGCGACAA 


CTGGACATCT 


GAGCTGTACA 


AGTACAAGGT 


GGTGACGATC 


1560 


GAGCCCCTGC 


GCGTGGCCCC 


CACCAAGGCC 


AAGCGCCGCG 


TGGTGCAGCG 


CGAGAAGCGC 


1620 


TAAAGCGGCC 


GC 










1632 



(2) INFORMATION FOR SEQ ID NO: 35: 

(1) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 2461 baa* pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

i 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



ACCGAGAAGC 


TGTGGGTGAC 


CGTGTACTAC 


CGCGTGCCCG 


TGTGGAAGGA 


GGCCACCACC 


60 


ACCCTGTTCT 


GCGCCAGCGA 


CGCCAAGGCG 


TACGACACCG 


AGCTGCACAA 


CCTGTGGCCC 


120 


ACCCAGGCGT 


CCGTGCCCAC 


CGACCCCAAC 


CCCCAGGAGG 


TGCAGCTCGT 


GAACGTGACC 


180 


GAGAACTTCA 


ACATGTGCAA 


GAACAACATG 


CTGGAGCAGA 


TGCATGAGGA 


CATCATCACC 


240 


CTGTGGGACC 


AGAGCCTGAA 


CCCCTGCGTG 


AAGCTGACCC 


CCCTGTGCCT 


GACCCTGAAC 


300 


TGCACCGACC 


TGAGCAACAC 


CACCAACACC 


AACAACAGCA 


CCGCCAACAA 


CAACAGCAAC 


360 


AGCGAGGGCA 


CCATCAAGGG 


CGGCGAGATG 


AAGAACTGCA 


GCTTCAACAT 


CACCACCAGC 


420 


ATCCGCGACA 


AG ATGC AG AA 


GGAGTACGCC 


CTGCTGTACA 


AGCTGGATAT 


CGTGAGCATC 


480 


CACAACGACA 


GCACCAGCTA 


CCGCCTGATC 


TCCTGCAACA 


CCAGCGTGAT 


CACCCAGGCC 


540 


TGCCCCAAGA 


TCAGCTTCGA 


GCCCATCCCC 


ATCCACTACT 


GCGCCCCCGC 


CGGCTTCGCC 


600 


ATCCTGAAGT 


GCAACGACAA 


GAAGTTCAGC 


GGCAAGGGCA 


GCTGCAAGAA 


CGTGACCACC 


660 
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<xi) SBQPEKCS OBSCRIPT2 M: SEQ ID NO: 32s 
CTC««T« OTCOAC^ TCCAACAACT AGTAATAAAA CAATAAATGT AATAAGA6AT 
«»TT»CT« AATCTOAGGA ATAAGTTTAT TMT A ««* T»C«OTWO TT»TT»TT»T 
TATTATTAAG TTTAAGT^TT TTACAAGCAA CAGATTTTAT AAGTTTATGA 
(2) INFORMATION FOR SEQ ID NO:33i ' ( 1 

U> SEQUENCE CHARACTERISTICS! ' " 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: einale 
(0) TOPOLOGY: linear 



60 
120 
170 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CGCGAATTCG CGGCCGCTTC ATAAACTTAT AAAATC 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1632 baae paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



36 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34 
CTCGAGATCC ATTCTGCTCT AAAGGAGATA CCCGGCCACA 
ACCTCCCCAG GCTCACCCAA GAGAAGGCCA CAAACCATGC 
CTCCCCACCT TG T ACCTGCT GGGCATCCTG CTCGCTTCCG 
TGGGTCACCG TGTACTACCO CGTGCCCCTG TGGAAGGAGG 
CCCACCGACG CCAACCCGTA CGACACCCAG CTCCACAACG 



-CTCCCCACCG ACCCCAACCC CCAGGAGGTG GAGCTCCTGA 
ATGTGGAAGA ACAACATGGT GGACCAGATG CATCAGCACA 
AGCCTGAACC CCTCCGTGAA GCTGACCCCC CTCTCCGTGA 
ACGAACACCA CCAACACCAA CAACACCACC GCCAACAACA 
ATCAAGGGCG CCCACATGAA CAACTGCAGC TTCAACATCA 
ATGCAGAAGG AGTACGCCCT CCTCTACAAC CTGGATATCG 
ACCAGCTACC GCCTGATCTC CTGCAACACC AGCCTCATCA 
AGCTTCGACC CCATCCCCAT CCACTACTGC GCCCCCGCCC 
AACGACAAGA ACTTCAGCGC CAAGGCCAGC TGCAAGAACG 



CACCCTCACC 
CCATGGGCTC 
TCCTACCCAC 
CCACCACCAC 
TGTGGGCCAC 



TCCGGTCCCC 
TCTGCAACCG 
CGAGAAGCTG 
CCTGTTCTGC 
CCAGGCGTCC 



ACGTGACCGA 
TCATCAGCCT 
CCCTGAACTG 
ACAGCAACAG 
CCACCAGCAT 
TGAGCATCGA 
CCCAGGCCTG 
GCTTCGCCAT 
TGAGCACCCT 



60 
120 
180 
240 

_300 



GAACTTCAAC 360 

GTGGGACCAG 420 

CACCCACCTC 480 

CGAGGGCACC 540 

CCCCGACAAG 600 

CAACGACAGC 660 

GCCCAAGATC 720 

CCTGAAGTGC 780 

GCAGTGCACC 840 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO :28: 
CGCGGATCCA CGCGTGAAAA AAAAAAACAT 30 
(2) INFORMATION FOR SEQ ID NO 1 29 1 ' 

(i) SEQUENCE CHARACTERISTICS: ' f 

(A) LENGTH: 149 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 1 > i | 

(D) TOPOLOGY: linear . ' 1 , 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

i 

H 

CGTGAAAAAA AAAAACATCT ATTAAGTGGA ACATTAGGAG TACCAGAACA TACATATAGA 60 
AGTACAGTAA TTTGTTTAGT GATAGATTCA TAAAAGTATT AACATTAGCA AATTTTACAA 120 
CAAAAGATGA AGGAGATTAT ATGTGTGAG 149 
(2) INFORMATION FOR SEQ ID NO: 30: I ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

CGCGAATTCG AGCTCACACA TATAATCTCC 30 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CGCGGATCCG AGCTCAGAGT AAGTGGACAA 30 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<xi) SEQUENCE DESCRIPTION, SEQ ZD NO,24: 
COCOCCCGCC COCTTTAOCO CTTCTCGCGC • TGCACCAC 

(2) INFORMATION FOR SEQ I D NO:2S, 38 
U> SEQUENCE CHARACTERISTICS i 

<B) TYPE* nucleic acid 
(C) STRANDEDNESS: ainale 
(?) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO, 25, 
CCCCCCCCAT CCAACCTTAC CATGATTCCA GTAATAAGT 

(2) INFORMATION TOR SEQ ID NO: 26: 39 

(i) SEQUENCE CHARACTERISTICS: 1 
(A) LENGTH , 165 baae pa ir« 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: ainale 

(D) TOPOLOGY: linaar 



(Xi) SEQUENCE DESCRIPTION, SEQ ID NO,26- 
ATGAATCCAG TAATAAGTAT AACATTATTA TTAACTCTAT TACAAATGAG TACAGGACAA 60 

— : ctttaacacc a ~ a ™ — ~z 

CAAAATAATA CAAATTTOCC AATACAACAT CAATTTTCAT TAAOC 

(2) INFORMATION FOR SEQ ID NO: 27: 165 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 baae paira 

(B) TYPE: nuclaic acid 
<C) STRANDEDNESS : Sinale 
(D) TOPOLOGY: linaar 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 27: 
CCCGGGCAAT TCACCCCTTA ATCAAAATTC ATCTTC 

(2) INFORMATION FOR SEQ ID NO, 28. 36 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 baaa paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : ainale 

(D) TOPOLOGY: linear 
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(A) LENGTHS 40 bas pairs 

(B) TYPBs nucleic acid 

(C) STRANDED NESS t single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION s SEQ ID NO: 21: 
GGAGACCGCT CATGTTGCTG CTGCACCGGA TCTGGCCCTC 40 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTHS 40 base pairs 

(B) TYPEs nucleic acid 

(C) STRANDEDNESSs single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGAGGGCCAG ATCCGGTGCA G CAGCAAC AT CACCGCTCTG 40 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AACATCACCG GTCTGCTGCT GCTGCTGACC CGCACGGCGG CAAGGACACC GACACCAACG 60 

ACACCGAAAT CTTCCGCGAC GGCGGCAAGG ACACCAACGA CACCGAAATC TTCCGCCCCG 120 

GCGG CGGCG A CATGCGCCAC AACTGGACAT CTGAGCTGTA CAAGTACAAG GTGGTGACGA 180 

TCGAGCCCCT GGCCGTGGCC CCCACCAAGC CCAAGCGCGC CGTGGTGCAG CGCCACAAGC 240 

GC 242 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17- 
CCC^CXOO, ACGACACCCT GCGCCAGATC CTCACCAACC ^ CTTCAACAAC 60 
AACACCATCC TCTTCACCAG ACCAGCGCCG CCCACCCCCA CAT^to.^ < 
ACTOCCOCOCC CCACCCCCA CATCCTCATC CACACCTTCA 120 

(2) INFORMATION FOR SEQ ID NO: 18: , ' 

(i) SEWBNCS CHARACTERISTICS t i 

(B) TYPE » nucleic acid 
<C) STRANDEDNESS : aingle 
(D) TOPOLOGY. li„a.r 9 



29 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
CCACTACAAC AATTCCCCGC CCCAGTTGA 
(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 
(B) TYPE: nucleic acid 

In! !I!! ANDEO,,BSSs »ingle 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION, SEQ ID NO.19, 
TCAACTCCCC CGCCGAATTC TTCTACTGC 

(2) INFORMATION FOR SEQ ID NO: 20: 29 
(i) SEQUENCE CHARACTERISTICS: 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: aingle 

(D) TOPOLOGY: linear 



60 
120 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
CGCGAATTCT TCTACTGCAA CACCACCCCC CTCTTCAACA GCACCTCCAA CGGCAACAAC 
ACCTGGAACA ACACCACCGG CACCAACAAC AATATTACCC TCCAGTGCAA GATCAAGCAG 
ATCATCAACA TGTGGCAGGA GGTGGGCAAG GCCATGTACG CCCCCCCCAT CGAGGGCCAG lt0 
ATCCCCTCCA GCACC 

195 

(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 
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(xi) SEQUENCE DESCRIPTION: SEQ ZD NO: 13: 

GAGAGCGTGC AGATCAACTG CACGCGTCCC 30 

i ' , 

(2) INFORMATION FOR SEQ ID NO i 14 J 1 

1 ( i 1 ' 

(i) SEQUENCE CHARACTERISTICS I 

(A) LENGTH: 120 bat« pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 14: 
AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCGCATCC ACATCGGCCC CGGGCGCGCC 60 
TTCTACACCA CCAAGAACAT CATCGGCACC ATCCTCCAGG CCCACTGCAA CATCTCTAGA 120 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND ED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTCGTTCCAC TTGG CTCTAG AGATGTTGCA 30 
(2) INFORMATION FOR SEQ ID NO: 16: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAACATCTC TAGAGCCAAG TGGAACGAC 29 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: singl 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION,.- SEQ ID .NO:?,: ' 
CAACTTCTTC TCCCCGGCGA AGCCGCCGCC 
(2) INFORMATION FOR SEQ ID NO, 10* 

(i) SEQUENCE CHARACTERISTICS i, 

(A) LENGTH : 47, base p«i r « 

(B) TYPE t nucleic acid 1 

(C) STRANDEDNESS: einale 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCGCCCCCCC CGCCTTCGCC ATCCTGAAGT GCAACGACAA GAAGTTC 47 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 1 ' 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCCGACAACA AGTTCAGCGG CAAGGGCAGC TGCAAGAACG TCAGCACCGT GCAGTGCACC 60 
CACGGCATCC GCCCGGTGGT CAGCACCCAG CTCCTGCTCA ACGG CAGCCT GGCCGAGGAG 120 
GAGGTGGTGA TCCGCAGCGA CAACTTCACC GACAACGCCA AGACCATCAT CGTCCACCTC 180 
AATCAGACCG TGCAGATC 

(2) INFORMATION FOR SEQ ID NO: 12: 



198 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 34 bass pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinals 
(D>—TOFOLOCY:-li nettr 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 : 
AGTTGGCACG CGTGCAGTTC ATCTGCACGC TCTC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 30 has* pairs 

(B) TYPE: nuci ic acid 

(C) STRANDEDNESS: singl 
(0) TOPOLOGY: linear 
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1 i . , 1 it 

t i ' 

GGCCGCGAGA TO 192 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH i 33 base pairs' 

(B) TYPE i nucleic acid i 

(C) STRANDEDNESS: single 

(D) TOPOLOGY i linear ' . > ' I 



(xi) SEQUENCE DESCRIPTION i SEQ ID NO: 6: > 
GTTGAAGCTG CAGTTCTTCA TCTCGCCGCC CTT 3 3 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid , 

(C) STRANDEDNESS: single ' 

(D) TOPOLOGY: linear 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GAAGAACTGC AGCTTCAACA TCACCACCAG C 31 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AACATCACCA CCAGCATCCG CGACAAGATG CAGAAGGAGT ACCCCCTGCT GTACAAGCTG 60 

GATATCGTGA GCATCGACAA CGACAGCACC AGCTACCGCC TGATCTCCTG CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCGGCT TCGCC 195 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ACCGAGAACC TCTGCCTCAC CGTGTACTAC CCCGTGCCCG TCTGGAAGAG AGCCCACCAC 
CACCCTCTTC TCCGCCAGCG ACCCCAAGCC GTACCACACC GAGGTGCACA ACCTCTOGCC 
CACCCAGGCG TGCCTGCCCA CCGACCCCAA CCCCCAGCAC GTGGAGCTCG TGAACGTGAC 
CGAGAACTTC AACATG 

(2) INFORMATION FOR SEQ ID NO: 3 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS_:_ single - 

(D) TOPOLOGY: linear 



60 
120 
180 
196 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCACCATGTT GTTCTTCCAC ATCTTGAAGT TCTC 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pair a 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GACCGAGAAC TTCAACATGT GGAAGAACAA CAT 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CH ARACTERISTICS : — 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGCAAGAACA ACATGCTGGA GCACATCCAT GAGCACATCA TCACCCTGTG GGACCAGACC 
CTCAAGCCCT GCGTGAAGCT GACCCCCTGT GCCTGACCTG AACTGCACCG ACCTGAGGAA 
CACCACCAAC ACCAACACAG CACCGCCAAC AACAACACCA ACAGCCACCG CACCATCAAG 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION* 

(i) APPLICANT: SEED, BRIAN 

<ii) TITLE OF INVENTION: OVEREXPRESSION OF MAMMALIAN AND VIRAL 

PROTEINS 

(ill) NUMBER OF SEQUENCES: 37 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish 6 Richardson 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

(D) STATE: Massachusetts 

(E) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.308 

i 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/308,286 

(B) FILING DATE: 19-SEP-1994 

<viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: CLARK, PAUL T 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE / DOCKET NUMBER: 00786/226001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: (617) 542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CGCGGGCTAG CCACCGAGAA GCTG 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 has pairs 

(B) TYPE: nucl ic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: lin ar 
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Use 



The synthetic genes of theinv.n^ 
for expressing the a pro tei n no^A^ ' UMUl 
—alien cells in cellt ltur e ™ ~~ 

VII, and Factor IX, . The synthetic genes of ^ 
invention are also, useful for gene therapy. 
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w r stained with the monoclonal antibody OX- 7 in a 
dilution of 1:250 at 4°C for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC- 
conjuga ted goat anti-mouse immunoglobulin antiserum* 
5 Cells were washed again, resuspended in 0.5 ml of a 

fixing solution, and analyzed on a EPICS 1 XL 1 
cytof luorometer (Coulter) . ' 

The following solutions were used in this 
procedure: > 
10 PBS (137 mM NaCI, 2.7 mM KC1, 4.3 mM Na,HPO d , 1.4 mM 
KH 2 P0 4 , pH adjusted to 7.4); Fixing solution (2% 
formaldehyde in PBS) . 

The concentration of gpl20 in culture supernatants 

15 was determined using CD4 -coated ELISA plates and goat 
anti-gpl20 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4«C on the 

20 plates. After 6 washes with PBS 100 Ml of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

25 incubated for 3 0 min with 100 Ml of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 pi of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

30 gpl20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) • 
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' The following solutions were used in this 
procedure: 2x HEBS buffer (280 mM MaCl, io mM KC1, 1.5 mM 
sterile filtered,; 0.25 mM CaCl 2 (autoclaved, . 
■ Immunopreci P ^ ? t-j ?n ( - 

After 48 to 60 hours. medium was exchanged and 
cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 ad of 35 S-translabel 
Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease 
inhibitors leupeptin, aprotinin and PMSF to 2.5 vg/mi 50 

100 „g/ml respectively, 1 ml of supernatant was ■ 
incubated with either 10 „i of packed protein A sepharose 
alone (rTHY-lenvegirre) or with protein A sepharose and 3 
ug of a purified CD4/ immunoglobulin fusion protein 
(kindly provided by Behring, (all gpi 2 0 constructs, at 
4«C for 12 hours on a rotator. Subsequently the protein 
A beads were washed 5 times for 5 to 15 min each time. 
After the final wash 10 M l of loading buffer containing 
was added, samples were boiled for 3 min and applied on 
7% (all gpi20 constructs, or 10% (rTHY-lenvegirre) SDS 
polyacrylamide gels (TRIS P H 8.8 buffer in the resolving ' 
TRIS P H 6.8 buffer in the stacking gel, TRIS-glycin 
running buffer, Maniatis et al. 1989). Gels were fixed 
xn 10% acetic acid and 10 % methanol, incubated with 
Amplify for 20 min, dried and exposed for 12 hours. 

The f ol lowlng_buffer S _and_solut ions-were-used-in- 



this procedure: Wash buffer (100 mM Tris, pH 7.5, 150 mM 
Naci, 5 mM CaCl 2 , 1% NP-40, ; 5x Running Buffer (125 mM 
Tris, 1.25 M Glycin, 0.5% SDS) ; Loading buffer (10 % 
30 glycerol, 4% SDS, 4% /J-mercaptoethanol, 0.02 % bromphenol 
blue) . 

293T cells were transfected by calcium phosphate 
coprecipitation and analyzed for surface THY-1 expression 
35 after 3 days. After detachment with 1 mM EDTA/PBS, c lis 
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and xpr ss HIV-l IIIB gpl20 under the 7.5 mixed 

early/late promoter (Earl et al . , J. Virol* , 65:31, 

1991) . In all experiments with recombinant vaccina cells 

were infected at a multiplicity of infection of at least, 

5 10. ",' * 

The following solution was used in this 'procedure: 

AP buffer (100 mM Tris HC1, pH 9.5,' 100 mM NaCI, 5 mM 

MgCl 2 ) 

Cell gMlture 

10 The monkey kidney carcinoma cell lines CV1 and 

Cos7, the human, kidney carcinoma cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
the American Tissue Typing Collection and were maintained 
in supplemented IHOM. They were kept on 10 cm tissue 

15 culture plates and typically .split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure : 

Supplemented IMDM (90% Iscove's modified Oulbecco Medium, 
10% calf serum/ iron-complemented, heat inactivated 30 
20 min 56°C, 0.3 mg/ml L-glutamine, 25 /xg/ml gentamycin 0.5 
mM 0-mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 
ml)). 

Transaction 

Calcium phosphate transfection of 293T cells was 
25 performed by slowly adding and under vortexing 10 fig 

plasmid DNA in 250 fxl 0.25 M CaCl 2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 
30 cotransf ection experiments with rev, cells were 
transfected with 10 Mg gpi20IIIb, gpl20Illbrre, 
syngpl20mnrre or rTHY-lenveglrre and 10 Mg of pCMVrev or 
CDM7 plasmid DNA. 
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fbrmamide, loo /tg/ml denatured salmon sperm DNA) ; Washing 
buffer I (2x SSC, 

0.1% SDS); Washing buffer II ( 0 .5x SSC, 0.1 % SDS) ; 20x 
SSC (3 M NaCI, 0.3 M Na 3 citrate, pH adjusted to 7.0) . 
5 Vaccinia r eeQ mh<n^j ftn 

Vaccinia recombination used a modification of the 
of the method described by Romeo and Seed (Romeo and 
Seed, Cell, 64: 1037, 1991). Briefly, cvi cells at 70 to 
90% confluency were infected with 1 to 3 m of a wildtype 
10 vaccinia stock WR (2 x 10 8 pfu/ml) for 1 hour in culture 
medium without calf serum. After 24 hours, the cells 
were transfected by calcium phosphate with 25 fig 1 TKG 
plasmid DNA per dish. After an additional 24 to 48 hours 
the cells were scraped off the plate, spun down, and 
IS resuspended in a volume of 1 ml. After 3 freeze/ thaw 
cycles trypsin was added to 0.05 mg/ml and lysates were 
incubated for 20 min. A dilution series of 10, 1 and 0.1 
Hi of this lysate was used to infect small, dishes (6 cm) 
of CVI cells, that had been pretreated with 12.5 M g/ml 
20 mycophenolic acid, 0.25 mg/ml xanthin and 1.36 mg/ml 

hypoxanthine for 6 hours. Infected cells were cultured ' 
for 2 to 3 days, and subsequently stained with the 
monoclonal antibody NEA9301 against g P 120 and an alkaline 
phosphatase conjugated secondary antibody. Cells were 
incubated with 0.33 mg/ml NBT and 0.16 mg/ml BCIP in AP- 
buffer and finally overlaid with 1% ag arose in pbs. 



25 



30 



Positive plaques were picked and resuspended in 100 fjtl 
Tris pH 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly 
scaled up. Finally, one large plate of Hela cells was 
infected with half of the virus of the previous round. 
Infected cells were detached in 3 ml of PBS, lysed with 
Dounce homogenizer and cleared from larger debris by 
centrifugation. VPE-8 recombinant vaccinia stocks were 
35 kindly provided by the AIDS repository, Rockville, MD, 
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plot blot analysis 1 

For slot blot analysis Id off cytoplasmic RNA 
was dissolved in 50 Ml dH 2 0 to which 150 Ml of lOx 
SSC/18% formaldehyde were, added. The solubilized RNA was 
5 then incubated at 65°C,for 15 min and spotted onto with a 
slot blot apparatus. Radioactively labelled probes of 
1.5 kb gpl20IIIb and syngpl20mn fragments were used for 
hybridization. Each of the two fragments was random 
labelled in a 50 m! reaction with 10 Ml of 5x oligo- 
10 labelling buffer, 8 Ml of 2.5 mg/ml BSA, 4 Ml of «[ 32 P]- 
dCTP (20 uCi/Ml; 6000 Ci/mmol) , and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37 °c 100 Ml 
of TE were added and unincorporated «[ 32 P]-dCTP was 
eliminated using G50 spin column. Activity was measured 
15 in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
at 42°C with 0.5 x 10 6 cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 
20 washing buffer I at room temperature, for one hour in 
washing buffer II at 65»C, and then exposed to x-ray 
film. Similar results were obtained using a 1.1 kb 
Notl/Sfil fragment of pCDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 
25 with a random- label led human beta-actin probe. RNA 
expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 
phosphor imager . 

The following solutions were used in this 

30 procedure: 

5x Oligo-labelling buffer (250 mM Tris HC1, pH 8.0, 25 mM 
MgCl 2 , 5 mM 0-mercaptoethanol, 2 mM dATP, 2mM dGTP, 2mM 
dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides [dNTP]6) ; 

Hybridization Solution ( M sodium ph sphate, 250 mM 

35 NaCl, 7% SDS, 1 mM EDTA, 5% d xtran sulfate, 50% 
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transferred to Whatman blotting paper dri d .« 
about l.hour, and exposed to X -rl film ^ '°° C * r 

temperature Tvnir.n * ' at rooB 

* A„„ e . ilng „„„„ {200 h« ^ 7 p r c " dur "-' 5x 

«TP>; Tarnation Mixls Im h U ^'"^ 
MM ddNTP (one ..ehV7.^? f "f* '<*». so mM Naci,. e 

\wne eacn) } ; Stop solution! /o«* , 
EDTA. 0.05 % bro»p h . noa b Le 0 05 » 20 
t-0 (0.9 M Tris bor,*. ,„ » «yl«ncy«nol ) ; 5x TBE 

, solution 

RMA i«^ lfttiTTn , ' 

, »k . < * tl »Pl«»»ie RNA was isolated from calcium 

culture cells, m current Protocol, in Molecular 
> Biology, Ausubel et al eds u n «»*»r 
i9o 2 » B • ' " Wlle y & Sons, New York 

extracts were incubated at 37-c for 20 min 
phenol/chloroform extracted t-w4~- 

RNA was di^i „ xcracted twxce, and precipitated. The 

«wa was dissolved in 100 ul buffer t • 

37 o C * rtr . , ourrer 1 and incubated at 

37 C for 20 _ m in._The-reaction-w as stopped by addina 25 

Ml stop buffer and precipitated again. * " 

The following solutions were used in this 

l r rloT' LySiS BU " er (TE COntai "^ with 50 mM Tris p H 
8.0 100 M NaC l. 5 M „gci 2 , 0 .5% NP40, ; Buffer I (TE 

RNAse inhibitor, 0.1 0/Ml ™ Ase free DNAsfi al 
buffer (50 « EDTA 1.5 M NaOAc 1.0 % SDS, 
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Synthetic genes were sequenced by the Sanger 
dideoxynucleotide method. Zn brief, 20 to 50 jig double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 1 

1 1 1 1 i 

5 min. Subsequently the ONA was precipitated with 1/10, 
volume of sodium acetate (pH 5.2) and 2 volumes 1 of ' 
ethanol and centrifuged for 5 min* The pellet was washed 
with 70% ethanol and resuspended at a concentration of l 
pq/fil. The annealing reaction was carried out with 4 fig 

10 of template DNA and 40 ng of primer in lx annealing 
buffer in a final volume of 10 til. The reaction was 
heated to 65 °C and slowly cooled to 37 °C- In a separate 
tube 1 Ail of 0.1 M DTT, 2 /*1 of labeling mix, 0.75 Ml of 
dH 2 0, 1 Ml of [ 35 S] dATP (10 uCi) , and 0.25 Ml of 

15 Sequenase* (12 U/fil) were added for each reaction. Five 
Ml of this were added to each annealed primer- 

template tube and incubated for 5 min at room 
temperature. For each labeling reaction 2.5 fil of each 
of the 4 termination mixes were added on a Terasaki plate 

20 and pre wanned at 37 °C At the end of the incubation 

period 3.5 fil of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 Ml of stop 
solution were added to each reaction and the Terasaki 
plate was incubated at 80°C for 10 min in an oven. The 

25 sequencing reactions were run on 5% denaturing 

polyacrylamide gel. An aery 1 amide solution was prepared 
by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 
100 g of acrylamide :bisacrylamide (29:1). 5% 
polyacrylamide 46% urea and lx TBE gel was prepared by 

30 combining 38 ml of acrylamide solution and 28 g urea. 

Polymerization was initiated by the addition of 400 m! of 
10% ammonium peroxodisulf ate and 60 til of TEHED. Gels 
wer poured using silanized glass plat s and sharkto th 
combs and run in lx TBE buffer at 60 to 100 W for 2 to 4 

35 hours (dep nding on the region to b read) . Gels were 
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ch esecl th into a 25p ffll bottlc . x panQl ^ " 

to the top and the bottle was spun at 4,200 rpm for 10 

»in. The pellet was resuspended in 4.1 ml of solution I 

5 ethidium bromide, and o r i ml. of 1% Triton X100 solution. 
It io U ooo Wer \ SPUn " a Becknan ** speed centrifuge 

1LI ^ I r 5 " in - The su P*«atant was transferred 

into Beckman Quick Seal ultracentrifuge tubes, which were 
then sealed and spun in a Beckman ultracentrifuge using a 
10 KVT90 fixed angle rotor at 80,000 rpm for > 2.5 hours 
The band was extracted by visible light using- a- -l ml 

"Z*?* 20 9aU9e neCdle * An equal volu »e of dH 2 o was 
added to the extracted material. DNA was extracted once 
. with n-butanol saturated with l M sodium chloride 
followed by addition of an equal volume of 10 M ammonium 
acetate/ i ^ EDTA. The material was poured into a 13 ml 
snap tube which was tehn filled to the top with absolute 
ethanol, mixed, and spun in a Beckman J2 centrifuge at 
10,000 rpm for 10 min. The pellet was rinsed with 70% 



15 



20 ethanol and resuspended in 0.5 to l ml of H 2 0. The DNA 
concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:200 (1 OD, fin = 50 
Mg/ml, . 260 

The following media and buffers were used in these 
25 procedures: M9 bacterial medium (10 g M9 salts, io g 

casamino acids ( hydro lysed ) , io tt i M 9 additions -7 5 

Mg/ml tetracycline (500 M l of a 15 mg/ml stock solution, 
12.5 Mg/ml ampicillin (125 M l of a 10 mg/ml stock 
solution,; M9 additions (io mM CaCl 2 , loo mM Mgso 4 , 200 
30 M g/al thiamine, 70% glycerol, ; LB medium (i.o % NaCl 0 5 
% yeast extract, 1.0 % trypton, ; Solution I (io mM EDTA 
PH 8.0,; Solution II (0.2 M NaOH 1.0 % SDS, ; Solution III 
(2.5 M KOAc 2.5 M HOAc) 
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was complemented with 10% DMSO t increase fidelity f 
the Taq polymerase. ' 
Small scale DNA preparation 

Transformed bacteria were grown in 3 ml LB 

5 cultures for more than 6 hours or overnight. 

i i 1 

Approximately 1*5 ml of each culture was poured into 1.5 
ml microfuge tubes, spun for 2 0 seconds to pellet, cells 
and resuspended in 200 /xl of solution I. Subsequently 
400 /xl of solution II and 300 Ml of solution III were 

10 added. The microfuge tubes were capped, mixed and spun 
for > 30 sec. Supernatants were transferred into fresh 
tubes and phenol extracted once. DNA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 

15 70 % ethanol and resuspended in 50 pi dH20 containing 10 
Ml of RNAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % NaCl, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 

20 (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 

adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HC1, , 
pH 7.5, 1 mM EDTA pH 8.0). 
Large scale DNA preparation 

One liter cultures of transformed bacteria were 

25 grown 24 to 36 hours (MC1061p3 transformed with pCDM 

derivatives) or 12 to 16 hours (MC1061 transformed with 
pUC derivatives) at 37 °C in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 

30 centrifuge at 4,200 rpm for 20 min. The pellet was 

resuspended in 40 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
th bottles wer shak n semivigorously until lumps of 2 
to 3 mm size d veloped. The bottl was spun at 4,200 rpm 

35 for 5 min and th supernatant was pour d through 
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cot taa 't 11 ' 1>eV *"-* cgc ggg gaa tti a 

cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27) . 

oaa aaa ? l9 ° 2 f<irWard (BanH1 / Mlu± > 1 gg a tec acg cgt 

gaa aaa aaa aaa cat (SEQ id NO: 28). ' 

5 a ea «. °' 22 ' 9t 9M aM aaat aaa cat 9ta tta agt gga 

ttg ttl HI ^ W ^ ^ ^ ^ a9t a9a ^ ' 
ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 

NO: 297.^ " a tat ' at9 tgt gag < S *° » 

" oligo 2 revferse (EcoRi/Sacl) : cgc gaa ttc gag etc 

aca cat ata at e tec (SEQ ^zp NO: 30} . 

oligo 3 forward (Bamm/Sacl) : cgc gga tee gag etc 
aga gta agt gga caa (SEQ ZO NO:, 31) . 

is * a + «- ° li9 ° ^ ^ ^ 9ta agt gga Caa aat cca a « agt 
15 agt aat aaa aca ata aat gta ata aga gat aaa tta gta aaa 

tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 
tta tta tta tta tta agt tta agt ttt tta caa gca aca gat 
ttt ata agt tta tga (SEQ ID NO: 32) . 

oligo 3 reverse (EcoRi/Notl) : cgc gaa ttc gcg gcc 
20 get tea taa act tat aaa ate (SEQ ID NO: 33). 
Polymerase Phnin R e »o t f ftn 

Short, overlapping 15 to 25 mer oligonucleotides 
annealing at both ends were used to amplify the long 
oligonuclotides by polymerase chain reaction (PCR) . 
25 Typieal PCR conditions were: 35 cycles, 55-C annealing 

te *Perature, 0.2 see extension time. PC R products we r e 

gel purified, phenol extracted, and used in a subsequent 
PCR to generate longer fragments consisting of two 
adjacent small fragments. These longer fragments were 
30 cloned into a CDM7-derived plasmid containing a leader 
sequence of the CDS surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

The following solutions were used in these 
r actions: iox PCR buffer (500 mM KC1, 100 mM Tris HC1, 
35 P H 7.5, 8 mM Mgci 2 , 2 mM each dNTP) . The final buffer' 
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i oligo 6: gcc aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag cag 1 ttc aag aac aag acc ate gtg 1 
ttc ac cag age age ggc ggc gac ccc gag ate gtg atg eac 
age ttc aac tgc ggc ggc (SEQ ID NO: 17) . 
5 oligo 6 reverse (EcoRl): gca gta gaa gaa ttc gcc 

gcc gca gtt ga (SEQ ID NO: 18) . 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 
tct tct act gc (SEQ ID NO: 19) . 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
10 ctg ttc aac age acc tgg aac ggc aac aac acc tgg aac aac 
acc acc ggc age aac aac aat att acc etc cag tgc aag ate 
aag cag ate ate aac atg tgg cag gag gtg ggc aag gcc atg 
tac gcc ccc ccc ate gag ggc cag ate egg tgc age age (SEQ 
, ID NO: 20) 

15 oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 

acc gga tct ggc cct c (SEQ ID NO: 21) . 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag 
caa cat eac egg tct g (SEQ ID NO: 22) . 

oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac 
20 ggc ggc aag gac acc gac acc aac gac acc gaa ate ttc cgc 
ccc ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg 
tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gcc 
ccc acc aag gcc aag cgc cgc gtg gtg cag cgc gag aag cgc 
(SEQ ID NO: 23) .. 
25 oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag 

cgc ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/Hind3) : cgc ggg gga tec 
30 aag ctt acc atg att cca gta at a agt (SEQ ID NO: 25). 

oligo 1: atg aat cca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat ttg cca ata caa cat gaa ttt 
35 tea tta acg (SEQ ID NO: 26). 
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KO: 5,. " 9 ' C "° " C '*= «» -tg (SEQ ID 

oligo' 2 reverse fPstit • ~ 1 

' — ~ r. „ . «i Sivy - «* 

gag ccc ate ccc ate cac tac *»,. 9 
(SBO zo HO, e, . ■ ■ ' * CC CCC 9CC W» «» ,**<= 

oiigo 3 reverse: gaa ctt et-f 

9 crt ctt gtc ggc ggc gaa gee 



15 ggc ggg (SEQ ID NO: 9). 

oligo 4 forward: 



gcg ccc ccg ccg get teg cca tec 



10) 



tga agt gea aeg aca aga agt tc (SE Q id no 

::: r ,„ gtg gt , a j - ^ ::: z z in in 
Tsisii - c 9t9 - - -« - - « ::: 

» «ru.rr,rs».* r ~ - - - 

° ligo 5 forward (Mlup : g ag age gtg cao 

tgc-acg cgt ccc (SEQ id NO: 13) -9tg-cag-at-c-aa c — 

aao coe °t i9 ° ^ ^ C9t CCC " C tac «c aag cgc 

30 all " C CCC 999 C * C * cc "c tac ace aee 

30 aag aae ate ate ggc aee ate etc cag gee cac tge aae ate 
tct aga (SEQ ID NO: 14) * atC 

oligo 5 reverse: 
gtt gca (SEQ ID NO: 15) 
oligo 6 forward 
35 aeg ac (SEQ ID NO: 16) . 



oligo 5 reverse: gtc gtt cca ctt ggc tct aga gat 

9 • 

oligo 6 rerware: gea a=. tet eta gag eca agt gg. 
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mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml 
BSA, 70 mM 0-mercaptoethanol , 0.02% NaN 3 ) ; lOx Ligation 
additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine) ; 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
5 Oligonucl eotide synthesis and purification 

Oligonucleotides were produced, on a Milligen 8750 
synthesizer (Millipore) • The columns were eluted with 1 
ml of 30% ammonium hydroxide, and , the eluted 
oligonucleotides were deblocked at 55°C for 6 to 12 

10 hours. After deblockiong, 150 pi of oligonucleotide were 
precipitated wi^ii lOx volume of unsaturated n-butanol in 
1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in ,a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 imI of H 2 0. The 

15 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:333 (1 OD 260 « 30 
Atg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
20 shown in this text are in 5' to 3' direction). 

oligo 1 forward (Nhel) : cgc ggg eta gec acc gag 
aag ctg (SEQ ID NO: 1) . 

oligo l: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 
25 gec age gac gec aag gcg tac gac acc gag gtg cac aac gtg 
tgg gee acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aacgtg acc gag aac ttc aac atg (SEQ 
ID NO: 2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 
30 tga agt tct c (SEQ ID NO: 3). 

oligo 2 forward: gac cga gaa ctt caa cat gtg gaa 
gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 
gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 
35 gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc acc gac ctg 
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Detailed Pr Q ^^„^ ftff 

The following procedures were used in the abov«- 
described experiments. 
Sequence Analysis 

5 * Se9UenCe analV8 «* employed the software developed 

by the University of Wisconsin Computer Group. 
Plasnid construe f ff ' nff 

Plasmid constructions employed the following 
methods. Vectors and insert ONA was digested at a 
10 concentration of 0.5 M g/10 M l in the appropriate 

restriction buffer for l - 4 hours (total reaction volume 
approximately 30 M l) . Digested vector was treated with 
10% (v/v) of 1 Mg/ ml calf intestine alkaline phosphatase 
, for 30 min prior to gel electrophoresis. Both vector and 
15 insert digests (5 to 10 M i each, were run on a 1.5% low 
melting agarose gel with TAB buffer. Gel slices 
containing bands of interest were transferred into a 1 5 
ml reaction tube, melted at 65-c and directly added to 
the ligation without removal of the agarose. Ligations 
were typically done in a total volume of 25 m in ix Low 
Buffer lx Ligation Additions with 200-400 U of ligase 1 1 
Ml of vector, and 4 M l of insert. When necessary, 5'' 
overhanging ends were filled by adding 1/10 volume of 250 
MM dNTPs and 2-5 U of Klenow polymerase to heat 
25 inactivated or phenol extracted digests and incubating 
for a pproximately 20 min at room temperature. When 



20 



necessary, 3-' overhanging ends were filled by adding 1/10 
volume of 2.5 mM dNTPs and 5-10 U of T4 ONA polymerase to 
heat inactivated or phenol extracted digests, followed by 

30 incubation at 37«C for 30 min. The following buffers 
were used in these reactions: lox Low buffer (60 mM Tris 
HC1, P H 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM 
/J-mercaptoethanol, 0.02% NaN 3 ) ; lOx Medium buffer (60 mM 
Tris HC1, P H 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 

35 70 mM /*-mercaptoethanol, 0.02% NaN 3 ) ; lox High buffer (60 
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composition. This might indicate that the possibility f 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis. 

5 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However/ other factors may 
play ar role. First, abundance of not maximally loaded 

10 mRNA's in eukaryotic cells indicates that initiation is 
rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 

15 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 

20 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral, 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 

25 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 

30 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev-responsive transcripts, it could be that RNA 
adenine methylation m diat s this translati nal 
suppression. 
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a seer ted molecule, the induct! n by r v was much mre 

ZJH K? nt : SUPP ° rtihg the abov « hypothesis, This can 
probably be explained by accumulation of secreted protein 

5 effect ^r^ 1 *' WhiCh ^nside^bly amplifies il rev ' 
5 effect ( if rev o*ly induces a minor increase for surface 
molecules in general, induction of HIV envelope by rlv 
cannot have the purpose of an increased surface 

level" 0 !: T ° f 80 inC -- ed intracellular gpxeo 

10 T t is c °»*^Y ""clear at the moment why this 

10 should be the case 1 . 

To test Whether small subtotal elements of a gene 
are sufficient to restrict expression and render it rev- 
dependent, rTHYlenv: immunoglobulin -fusion proteins were 
generated, in which only about one third of the total 
gene had the envelope codoh usage. Expression levels of 
this construct were on an intermediate level, indicating 
that the rTHY-ienv negative sequence element is not 
dominant over the immunoglobulin part. This fusion 
protein was not or only slightly rev-responsive, 
indicating that only genes almost completely suppressed 
can be rev-responsive. 

Another characteristic feature that was found in 
the codon frequency tables is a striking 
underrepresentation of CpG triplets. m a comparative 
study of codon usage in E. coli, yeast, drosophila and 

prxma tes *t w «s shown that in a high number o f an», y9 .« 

primate genes the 8 least used codons contain all codons 
with the CpG dinucleotide sequence. Avoidance of codons 
containing this dinucleotide motif was also found in the 
30 sequence of other retroviruses. it seems plausible that 
the reason for underrepresentation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of CpG cytosines. The exp cted 
number of CpG dinucleotides f r HIV as a whole is about 
one fifth that expected on the basis of the base 
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express! n is due to trarislational differences and not 
mRNA stability. 

Retroviruses in general do hot show a similar 
preference towards A and T, as found for HIV. But if this 
5 family was divided into two subgroups, lentiviruses, and 
non-lentiyiral retroviruses, a similar preference to Ai 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus/ the availing evidence 
suggests that lentiviruses retain a characteristic 
10 pattern of envelope codons not because of an inherent 
advantage to the reverse transcription or replication of 
such residues, but rather for" sone reaYon~peculiaf"'to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 
15 retroviruses are additional regulatory and non- 

essentially accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 
20 molecules is based on it. In fact, it is known that one 
of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 
25 hypothesis was proved using a similar strategy as above, 
but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 
30 considerably lower in comparison to the native molecule, 
almost two orders of magnitude when analyzed by 
immunofluorescence of the surface expressed molecule (see 
4.7). If rev was coexpr ssed in trans and a RRE element 
was pres nt in cis only a slight induction was found f r 
35 the surface molecul . Howev r, if THY— 1 was xpressed as 



BNSDOCID: <WO 9609378A1 _l_> 



t 



WO 96/09378 i l 

, PCT/US95/11511 

' * • i 1 

t 

■ ■■■ ■ 23 * 

rTHY-ienv did not restr^f ^ 1 

. "strict .expression to an equal level' 

ili i or r ™ x - ienv **». r, g » 1 . tlo r b y r rr 

Wears to be ineffective if pro tein expression i."ot 
al»o.t completely supprosaad. °* 

D , 1 " et CO " Mrison »«ween codon usage fr.qu.ncy of 
HIV envelope and highly eror ls ».H •,„.._ "1««ncy of 

striking expressed human genes rov.als a 

llltl Mt '" nca *°' *«nty amino acid.. o„. 
10 cod T"" °' ^ "'"•"-l significance of thi. 

.oil." llll T" 13 ^ " ndln9 » 0 "' «- "in. amino 

acid. with two fold codon degeneracy, the favored third 
residue is A or (J in .11 „i„ e . The prob a bil ity ihat a'l 
nine of two eguiprobabl. choice, win be th. sL is 
approximately 0.004. and hence by .„ y conventional 

raT" ^ thl " rMidUe Ch ° iCe """" considered 

founnMori"" SVidenCe ° f * SkWd " don P«'«.n=. is 
found among th. more degenerate codon. . where a strong 

selects for triplet, bearing ad.nin. can be seen. ™, 

contrasts with th. pattern for highly expressed g l M 

which favor codons k>o»T-{ n .- /■» ' 

oearing c, or less commonly g in 

third position of codon, with three or -or. c " ^ ' 

degeneracy. 

The systematic exchange of native codons with 
codons of highly expressed human genes dramatically 

bv C E^! d h eXPreSSi ° n 0t * quantitative analysis 

* ELISA showed that ex pr,e S sion-of-t-he-synthetic-g ene was 

at least 25 fold higher in comparison to native gpi 2 0 
after transient transfection into human 293 cells. The 
30 rItH ent r ti0n l6VelS ^ ^ ELISA ex P-i-nt shown were 

wh^i 1' SinCe ^ ELISA US6d f ~ ^-tification 
which is based on gpi20 binding to CD4 , only native, non- 
denatured material was detected. This may explain ^e 
apparent low expression. Measurement of cytoplasmic mRNA 
levels demonstrated that the diff rence in protein 
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pCDM7 r pCMVrev. The rTHY-lenveglrre construct was made 

by anchor PGR using forward and reverse primers with Nhel 

and BamHl restriction sites respectively. The PGR 

fragment was cloned into a, plasmid containing a CDS i 

5 leader and human IgGl hinge, GH2 and CH3 domains. 

Supernatants of' 35 S labelled cells were harvested 72 ho'urs 

post transf ection, precipitated with a mouse monoclonal 

antibody OX7 against rTHY-1 and anti mouse IgG sepharose, 

and run on a 12 I reducing SDS-PAGE. The procedures used 

10 are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 

rTHY-lenv/ immunoglobulin fusion protein is secreted into 

the supernatant. Thus, this gene should be responsive to 

i 1 

rev-induction. However, in contrast to rTHY-lenvPI-, 

15 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1: immunoglobulin fusion 
protein with native rTHY-1 or HIV envelope codons was 
measured by immunoprecipitation. Briefly, human 293T 

20 cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons) . The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY-1 gene was used as 

25 template. Supernatants of 35 S labelled cells were 

harvested 72 hours post transf ection, precipitated with a 
mouse monoclonal antibody OX7 against rTHY-1 and anti 
mouse IgG sepharose, and run on a 12% reducing SDS-PAGE. 
THe procedures used in this experiment are described in 

30 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wild type rTHY-1 
as the fusion partner, but wer still c nsiderably high r 
than rTHY-lenv. Accordingly, both parts of the fusi n 

3 5 protein influenced expr ssion levels. Th addition of 
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using the oligonucleotides , 1 
cgcggggctagcgcaaagagtaataagtttaac as forward and 
cgcggatcccttgtattttgtactaata a as reverse primers and the 
synthetic rTHY-lenv construct as template. After 
5 digestion with Nhel and, Notl the PGR fragment was cloned 
into a plasmid containing CDS leader, and rre sequences 
Supernatants of «s labelled cells wm harvested ?2 
hours post transfection, precipitated with a mouse 
monoclonal antibody 0X7 against rTHY-i and anti mouse IgG 
10 sepharose, and run on a 12% reducing SDS-PAGE. 

In this experiment the induction of rTHY-lenv by 

rev was much more prominent and clearcut than in' the 
above-described experiment and strongly suggests that rev 
1 is able to translationally regulate transcripts that are 
15 suppressed by low-usage codons. 

Rg ^ J^ep^t *TTPr »~ 1pn Vf a rTHV-lPnv: i-m^K.., 1n 

To test whether low-usage codons must be present 
throughout the whole coding sequence or whether a short 

20 region is sufficient to confer rev-responsiveness, a 
rTHY-lenv: immunoglobulin fusion protein was generated. 
In this construct the rTHY-lenv gene (without the 
sequence motif responsible for phosphatidylinositol 
glycan anchorage) is linked to the human IgGi hinge, CH2 

25 and CH3 domains. This construct was generated by anchor 
PCR using primers with Nhel and BamHI restriction sites 
and rTHY-lenv as template. The PCR fragment was cloned 
into a plasmid containing the leader sequence of the CDS 
surface molecule and the hinge, CH2 and CH3 parts of 
30 human IgGl immunoglobulin. A Hind3/Eagl fragment 
containing the rTHY-lenvegl insert was subsequently 
cloned into a pCDM7-derived plasmid with the rre 
sequence . 

To measure the respons of the rTHY-lenv/ 
35 immunoglobin fusion gene (rTHY-lenveglrre) to rev human 
293T cells c transfected with rTHY-lenveglrre and eith r 
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responsiveness of the a rat THY-lehv construct having a 
3' RRE, human 293T cells were cotransf ected 
ratTHY-lenvrre and either CDM7 or pCMVrev. At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
5 PBS and stained with the OX-7 anti rTHY-1 mouse 

monoclonal antibody and a secondary FITC-conjugated 
antibody. Fluorescence intensity was measured using a 
EPICS XL cytof luorometer. These procedures are described 
in greater detail below, 

10 In repeated experiments, a slight increase of 

rTHY-lenv expression was detected if rev was 
cotransf ected with the rTHY-lenv gene. To further 
increase the sensitivity of the assay system a construct 
expressing a secreted version of rTHY-lenv was generated. 

15 This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
over an extended period , in contrast to surface expressed 
protein, which appears to more closely reflect the 

20 current production rate. A gene capable of expressing a 
secreted form was prepared by PCR using forward and i 
reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The 

25 PCR product was cloned into a plasmid which already 
contained a CD 5 leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 
and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 

30 The rev-responsiveness of the secreted form 

ratTHY-lenv was measured by immunoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid xpressing a secreted form of ratTHY-lenv and th 
RRE s quence in cis (rTHY-lenvPI-rre) and either CDM7 or 

35 pCMVrev. The r TH Y - 1 en vP I -RRE construct was mad by PCR 
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Expression levels of native rTH^i and rTHY-1 with 
the HIV envelop codons were guantitated by 
immunofluorescence of transiently' transacted 293T cells . 

5 is al T S ^ ^ eXPreSSi ° n ° f -tive „1> gene * 

5 is almost two orders of magnitude , above the background , 
level of the control transfected cells'. ( P CDM7) . ' m ' 
contrast, expression of the synthetic rat THY-i is 

TlT^l* l0Wer that ° f tk ? natiVe * ene 

by the shift to of the peak towards a lower channel 
10 number) . 



To prove that no negative sequence elements 
promoting mRNA degradation were inadvertently introduced 
a construct was generated in which the rTHY-ienv gene was 
cloned at the 3' end of the synthetic gpi20 gene (FIG. 9, 
15 panel B, . m this experiment 293T cells were transfected 
with either the syngpi20mn gene or the syngpi20/rat THY-l 
env fusion gene < syngpl20mn. rTHY-ienv) . Expression was 
measured by immunoprecipitation with CD4:lg G fusion 
protein and protein A agarose. The procedures used in 
20 thxs experiment are described in greater detail below. 

Since the synthetic gpi20 gene has an UAG stop 
codon, rTHY-ienv is not translated from this transcript 
If negative elements conferring enhanced degradation were 
present in the sequence, gpi20 protein levels expressed 
25 from this construct should be decreased in comparison to 
the syngpi20mn construct without rTHY-ienv. fig. 9 
panel-A— shows-that-t he expression of both constructs is 



30 



35 



* — w*. uwui construct 

similar, indicating that the low expression must be 
linked to translation. 

gene riflT^^^"^ * *YTl*^r r «t THY 1 

To explore whether rev is able to regulate 
expression of a rat THY-l gene having env codons, a 
construct was made with a rev-binding site in the 3' end 
of the rTHYlenv open reading frame. To measure rev- 
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expression of both native and synthetic g n was 
investigated. Since regulation by rev requires the rev- 
binding site RRE in cis, constructs were made in which 
this binding site was cloned into the 3' untranslated i 
5 region of both the native and the synthetic gene. These 
plasmids were co-transf ected with rev or a control 
plasmid in trans > into 293T cella, and gpl20 expression 
levels in supernatants were measured semiquantitatively 
by immunoprecipitation. The procedures used in this 
10 experiment are described in greater detail below. 

As shown in FIG. 5, panels A and B, rev 
upregulates the native gpl20 gene, but has no effect on 
the expression of the synthetic gpl2 6 gene. Thus, the 
action of rev is not apparent on a substrate which lacfcs ■ 
15 the coding sequence of endogenous viral envelope 
sequences . 

Fvnr eS5 ion ^ » synthetic rat THY-1 qene With HIV 
envelope codons 

The above-described experiment suggest that in 
20 fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 
synthetic version of the gene encoding the small, 
typically highly expressed cell surface protein, rat 
THY-1 antigen, was prepared. The synthetic version of 
25 the rat THY-1 gene was designed to have a codon usage 

like that of HIV gpl20. In designing this synthetic gene 
AUUUA sequences, which are associated with mRNA 
instability, were avoided. In addition, two restriction 
sites were introduced to simplify manipulation of the 
30 resulting gene (FIG. 6). This synthetic gene with the 
HIV envelope codon usage (rTHY-lenv) was generated using 
three 150 to 170 mer oligonucleotides (FIG. 7) . In 
contrast to th syngpl20mn g ne, PCR products were 
directly cloned and assembl d in pUC12, and subsequ ntly 
35 clon d into pCDM7. 
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Computer Grou£. Numberl "prLeL W * sconsin Genetics 
which a particular codln 11 ull 2 P erc entage in 

lentiviral retroviruses w as 2S2Sii.S°J aB USage of no «- 
precursor sequences ol boJfne TS^^f"™ the enve *°Pe 
leukemia virus, human T^ill i.SEi2? ia Y lrus f «line 
T-cell lymphotropic virul jype £ ^^-"J type X ' hUfflan 
forming isolate of murine l2J-i»f: ^ e minJc cel1 f°<=us- 
Rauscher spleen ■ focua-SJmJSJ — ? Vlrus ( M «LV) # the 
the 4070A amphStJop?! !lo?a?I iS^S!' 10A1 folate, 

leukemia virus isolate J d the "^^Proliferative 

simian sarcoma virus si*??*? 0 ™ ??* leu kemia virus? 
20, leukemogenic retiovtruf ^J? 23%°!^ ^ mi * virus < 

virus. The codon freouenev ^Sif * gibbon a P e leukemia 
SIV Antiviruses we^ q c^iliS J?L f ?H the no »-*™> non- 
precursor sequences for XorSL ISI^f . envel °Pe 
virus, equin2 infections arthritis encephalitis 

immunodeficiency tt^lT^Tsnl^^ 1 ^ 



In addition to the prevalence of A containino 
codons, lentiviral codons adhere to the HIV pattern of 
-trong CpG underrepresentation, so that the third 

tripes alanlne ' Pr ° line ' " rine and th «--e 

triplets is rarely G. The retroviral envelope triplets 

of L a S1 :: lar ' ^ 1SSS ~— . underrepresentation 
ILV^ " OSt ° bViOUS diffe ""~ between Antiviruses 
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30 



40 



. . ; — w— tpu- preva 

tr!nl^ the "" 9e ° f th * cc * variant of ar,i„i„. 
a»o™ *h' . f ««W-»tly r.pr,s.nted 

*. H I v-;:r;^ g : he r; 1 r „r iation by rev is c ° n — 

on "sage, the influence of rev on the 
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TABLE , 2 : Codon f requ ncy in the env 1 pe gene of 
lentiviruses (lenti) and non-lentiviral 
retroviruses (other) . 
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gptipn Usaae in T^n t Ty 1rin • 

Because it appears that codon usage has a 
significant impact on expression in mammalian cells, the 
cpdon frequency in the envelope genes of other 
5 retroviruses was, examined. This* study f ound no , clear' , 
pattern of codon preference between , retroviruses in 
general. However,, if viruses fro* the Antivirus genus, 
to which HIV-l belongs to, were analyzed separately, 
codon usage bias almost identical to that of HIV-l was 
10 found, a codon frequency table from the envelope 
glycoproteins of a variety of (predominantly type C ) 
retroviruses excluding the lentiviruses was prepared, and 
compared a codon frequency table created from the 
envelope sequences of four lentiviruses not closely 
related to HIV-i (caprine arthritis encephalitis virus, 
equine infectious anemia virus, feline immunodeficiency 
virus, and yisna virus) (Table 2). The codon usage 
pattern for lentiviruses is strikingly similar to that of 
HIV-l, in all cases but one, the preferred codon for 
20 HIV-l is the same as the preferred codon for the other 

lentiviruses. The exception is proline, which is encoded 
by CCT in 41% of non-HIV lentiviral envelope residues 
and by cca in 40% of residues, a situation which clearly 
also reflects a significant preference for the triplet 
25 ending in A. The pattern of codon usage by the non- 
lentiviral envelope proteins does not s how » «4i»Har 



15 



30 



predominance of A residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes. In general 
non-lentiviral retroviruses appear to exploit the 
different codons more equally, a pattern they share with 
less highly expressed human genes. 
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were quant i tat d by scanning th hybridized membran s 
with a phospoimager. The procedures used are described 
in greater detail below. 

This experiment demonstrated that there was ho 
5 significant difference in the mRNA levels of cells 
transfected with either the natiye or synthetic gp!20 
gene* In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gp!20 gene was even lower than 
that of the native gpl20 gene. 

10 These data were confirmed by measuring expression 

from recombinant vaccinia viruses. Human 293 cells or 
Hela cells ~were~~i"nf ected"wi th vaccinia "virus expressing 
wildtype gpl20 Illb or syngpl20mn at a multiplicity of 
infection of at least 10. Supernatants were harvested 24 

15 hours post infection and immunoprecipitated with 

CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 

This experiment showed that the increased 

20 expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene | 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 7.5k 
promoter. Because vaccinia virus mRNAs are transcribed 

2 5 and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 
types, the kidney cancer cell line 293 and HeLa cells. 

30 As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus . 
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CD4 in the demobilized phase! ' This analysis shows (FIG. 
4) that ELISA data were comparable to the 
immunoprecipitation data, with a gpi20 concentration of 

s s^?? ^ 2s ng/mi for the gpi 20 g , ne/ an a 

less than the background cutoff , (5 ng/»i) for all the , 
native gpi2o genes. Thus, expression of the synthetic 
g P 120 gene appears to be at least, one order of magnitude 
higher than wildtype gpi20 genes. ' m the experiment ' 
shown the increase was at least 25 fold. 
10 The Ro1ft of rev <n ap1?n EyT , r<aee ^ n 

Since rev appears to exert its effect at several 
steps in the expression of a viral transcript, the 
possible role of non-translationa* effects in the 
improved expression of the synthetic gpi20 gene was 
15 tested. First, to rule out the possibility that negative 
signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by 
changing the nucleotide sequence, cytoplasmic mRNA levels 
were tested. Cytoplasmic RNA was prepared by NP40 lysis 
20 of transiently transfected 293T cells and subsequent 

elimination of the nuclei by centrif ugation. Cytoplasmic 
RNA was subsequently prepared from lysates by multiple 
phenol extractions and precipitation, spotted on 
nitrocellulose using a slot blot apparatus, and finally 
25 hybridized with an envelope-specific probe. 

Briefly, cyt oplasmic mRNA 293 cells transfected 

with COM4, gpi20 IIIB, or syngpi20 was isolated 36 hours 
post transfection. Cytoplasmic RNA of Hela cells 
infected with wildtype vaccinia virus or recombinant 
30 virus expressing gpi20 Illb or the synthetic gpi 2 0 gene 
was under the control of the 7.5 promoter was isolated 16 
hours post infection. Equal amounts were spotted on 
nitrocellulose using a slot blot device and hybridized 
with randomly labelled 1.5 kb gpl20IIlb and syngpl20 
35 fragments or human beta-actin. RNA expression levels 
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T compare the wild- type 'and synthetic gpl20 
coding sequences/ the synthetic gpi20 coding sequence was 
inserted into a mammalian expression vector and tested in 
transient transfection assays. Several different native 
5 gpl20 genes were, used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gpl20 HIV Illb construct used as control was generated by 
PCR using a Sall/Xhol HIV-1 HXB2 envelope fragment as 
10 template. To exclude PCR induced mutations a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with thVrespective sequence from the proviral 
clone. The wildtype gpl20mn constructs used as controls 
were cloned by PCR from HIV-1 MN infected C8166 cells 
15 (AIDS Repository, Rockville, MD) and expressed gpl20 
either with a native envelope or a CD5 leader sequence. 
Since proviral clones were not available in- this case, 
two clones of each construct were tested to avoid PCR 
artifacts. To determine the amount of secreted gpl20 
20 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate 
coprecipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 
sepharose. 

25 The results of this analysis (FIG. 3) show that 

the synthetic gene product is expressed at a very high 
level compared to that of the native gpl20 controls. The 
molecular weight of the synthetic gp!20 gene was 
comparable to control proteins (FIG. 3) and appeared to 

30 be in the range of 100 to 110 kd. The slightly faster 
migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylation is either not complete 
or altered to some ext nt. 

To compare xpr ssi n mor accurately gpl20 

35 prot in levels were quantitated using a gp!20 ELISA with 
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adjacent fragments could be co-amplif led because f 
overlapping sequences at the end of either fragment 
These fragments, which were between 350 and 400 bp in 
size, were subcloned into a P CDM7-derived plasmid 
5 containing the leader sequence of the CDS surface 
molecule followed by a Nhel/Pstl/Mlul/EcoRi/BamHi 
polylinxer. Each of the restriction enzymes in this 
polylinlcer represents a site that is present at either 
the 5' or 3' end of the PCR-generated fragments. Thus, 
10 by sequential subcloning of each of the' 4 long fragments, 
the whole gpi20 gene was assembled. For each fragment 3 
to 6 different clones were subcloned and sequenced prior 
to assembly. A schematic drawing of the method used to 
construct the synthetic gpi20 is shown in FIG. 2 The 
15 sequence of the synthetic gpi 20 gene (and a synthetic 
g P 160 gene created using the same approach, is presented 
in FIG. x. 

The mutation rate was considerable. The most 
commonly found mutations were short (i nucleotide) and 
20 long (up to 30 nucleotides) deletions, m some cases it 
was necessary to exchange parts with either synthetic 
adapters or pieces from other subclones without mutation 
in that particular region. Some deviations from strict 
adherence to optimized codon usage were made to 
25 accommodate the introduction of restriction sites into 

thG res m ting gene to facilitate the replacement of 

various segments (FIG. 2). These unique restriction sites 
were introduced into the gene at approximately loo bp 
intervals. The native HIV leader sequence was exchanged 
30 with the highly efficient leader peptide of the human CDS 
antigen to facilitate secretion. The plasmid used for 
construction is a derivative of the mammalian expression 
vector pCDM7 transcribing the inserted gene under the 
control f a str ng human CMV imm diat early promoter. 
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Codon frequency was calculated using the GCG program 
established the the University of Wisconsin Genetics 
Computer Group. Numbers represent the percentage of 
15 cases in which the particular codon is used. Codon usage 
frequencies of envelope genes of other HIV-1 virus 
isolates are comparable and show a similar bias. 



In order to produce a gpl20 gene capable of high 
20 level expression in mammalian cells, a synthetic gene 
encoding the gpl20 segment of HIV-1 was constructed 
(syngpl20mn) , based on the sequence of the most common 
North American subtype, HIV-1 MN (Shaw et al. 1984; Gallo 
et al. 1986). In this synthetic gpl20 gene nearly all of 

2 5 the native codons have been systematically replaced with 

codons most frequently used in highly expressed human 
genes (FIG. 1) . This synthetic gene was assembled from 
chemically synthesized oligonucleotides of 150 to 2 00 
bases in length. If oligonucleotides exceeding 120 to 

3 0 150 bases are chemically synthesized, the percentage of 

full-length product can be low, and the vast excess of 
material consists of shorter oligonucleotides. Since 
these shorter fragments inhibit cloning and PCR 
procedures, it can be very difficult to use 
35 oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
singl -strand d oligonucleotid pools wer PCR amplified 
befor cloning. PCR products were pur if i d in agaros 
gels and used as templates in th next PCR step. Two 
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frequently used variants can be accounted t r by the 
observation that the dinucleotide CpG is 

Tt7T: SBnte * : thUS third ' POSiti ° n is lively 

5 c dl V "r!" SeC ° nd POSiti ° n i? C ' as in the ' 

llTclx? , ^ Pr ° line ' SeHl,S and . leonine; and , 

TABLE 1. ^ don n F "<?«fncy i„ the HIV-i m b env gene 
and in highly expressed human genes. 

High Env „. ^ ' 

10 Ala ii ^ High Bnv 
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Notl site of the synjgpi20mn plasmid and' tested for 
corredt orientation. Supernatants of 3 5S labelled cells 
were harvested 72 hours post transf ection, precipitated 
with CD4:IgG fusion protein and protein A agarose, and 
5 run on a 7% reducing SDS-PAGE. Figure 9, panel B is a 
schematic diagram of the constructs used in the ' 1 

experiment depicted in panel A of this figure. 

■ ■ 1 1 1 ' 

Description of the Preferred Embodiments 

Construction of a Synthetic orpl20 Gene Having Codons 

10 Found in Hiahlv "Expressed Human Gen^ 

A codon frequency table for the envelope precursor 
of the LAV subtype of HIV-l was generated using software 
developed by the University 6f Wisconsin Genetics 
Computer Group. The results of that tabulation are 

15 contrasted in Table 1 with the pattern of codon usage by 
a collection of highly expressed human genes. For any 
amino acid encoded by degenerate codons, the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 

2 0 Moreover a simple rule describes the pattern of favored 
envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. in all cases but one 
this means that the codon in which the third position is 

25 A is the most frequently used. In the special case of 
serine, three codons equally contribute one A residue to 
the mRNA; together these three comprise 85% of the codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon 

30 choice for arginine, in which the AGA triplet comprises 
88% of all codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerat c dons whos third residue must be a 
pyrimidine. Finally, the inconsistencies among the less 
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, Figure 6 is a comparison of the seguenc f the 
wildtyp rat THY-l gene, (wt) (SEQ. id. NO: 37) and a ' 
synthetic rat THY-l gene (env) (seq! id. NO: 36) 
constructed by chemical synthesis and having the most 
5 prevalent codons found in the HIV-i env gene. 

Figure 7 is a schematic diagram of the synthetic 
ratTHY-l gene. The solid black box denotes the signal 
peptide. The shaded box denotes the sequences in the 
precursor which direct the attachment of a phophatidyl- 
10 inositol glycan anchor. Unique restriction sites used 
for assembly of the THY-l constructs are marked H 
(Hind3), M (Mlul), s (saci) and No (Notl) . The, position 
of the synthetic oligonucleotides employed in the 
construction are shown at the bottom of the figure. 

, Figure 8 is a graph depicting the results of flow 
cytometry analysis. in this experiment 293T cells 
transiently transfected with either wildtype rat THY-l 
(dark line), ratTHY-l with envelope codons (light line) 
or vector only (dotted line) . 293T cells were 
transfected with the different expression plasmids by 
calcium phosphate coprecipitation and stained with anti-> 
ratTHY-l monoclonal antibody 0X7 followed by a polyclonal 
FITC- conjugated anti-mouse IgG antibody 3 days after 
transfection. 

Figure 9, panel A is a photograph of a gel 
illustrating the results of immunoprecipitation analysis 
of-supernatants-of-human-293T-cel ls transfected with 



15 



20 



either syngpi20mn (A) or a construct syngpl20mn.rTHY-ienv 
which has the rTHY-lenv gene in the 3' untranslated 
30 region of the syngpl20mn gene (B) . The 

syngpi20mn. rTHY-lenv construct was generated by inserting 
a Notl adapter into the blunted Hind3 site of the 
rTHY-lenv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-lenv gene was cloned into the 
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, Figure 4 is a graph d picting the results f ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 

, r • 1 

cells transfected with plasmids expressing gpl2 0 encoded 1 
5 by the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN 
isolate (gpl20mn) , by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CDS antigen (9Pl20mn CD5L) , or by the chemically 
synthesized gene encoding the MN variant with human CDS 

10 leader (syngpl2 0mn) were harvested after 4 days and 
.tested in a gpl20/CD4 ELISA. The level of gpl20 is 
expressed in t ng/ml. 

Figure 5, panel A is a photograph of a gel 
illustrating the results of a immunoprecipitation assay 

15 used to measure expression of the native and synthetic 
gpl20 in the presence of rev in trans and the RRE in cis. 
In this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 /*g 
of plasmid expressing: (A) the synthetic gpl20MN sequence 

20 and RRE in cis, (B) the gpl20 portion of HIV-1 IIIB, (C) 
the gpl20 portion of HIV-1 IIIB and RRE in cis, all in 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syngp!20mnRRE were generated 
using an Eagl/Hpal RRE fragment cloned by PCR from a 

25 HIV-1 HXB2 proviral clone. Each gp!20 expression plasmid 
was cotransfected with 10 fig of either pCMVrev or CDM7 
plasmid DNA. Supernatants were harvested 60 hours post 
transfection, immunoprecipitated with CD4:IgC fusion 
protein and protein A agarose, and run on a 7% reducing 

30 SDS-PAGE. The gel exposure time was extended to allow the 
induction of gpl2 0IIIbrre by rev to be demonstrated. 
Figure 5, panel B is a shorter exposure of a similar 
exp rim nt in which syngpl20mnrr was cotransf ct d with 
or without pCMVr v. Figure 5, pan 1 C is a schematic 

35 diagram of th constructs used in panel A. 
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Press, publisher, B rkeley, CA (1981); Maniatis, T . t 

* 1 ; ' y^l«r non(n fli » M^nfn- H ni 1 2nd Ed. 

cold spring Harbor Laboratory, publisher, cold Spring 
Harbor, NY (1989); and Current Protocol, <n M^ ni1nr 
5 fiialoc*, Ausubel et al„, Wi i ey Press, New York, NY 
(1989). 

Detailed DesciH ,m- < on 

Description of f nr praw<n 7fl 
Figure 1 depicts the sequence of the synthetic 
10 gpl20 (SEQ ID NO: 34) and a synthetic gpieo (SEQ ID NO- 
35) gene in which_codons-have-been replaced by-those 
found in highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
gpl20 (HIV-i MN) gene. The shaded portions marked vi to 
15 vs indicate hypervariable regions. The filled box 

indicates the CD4 binding site. A limited number of the 
unique restriction sites ares shown: H (Hind3), Nh 
(Nhei), p (Pstl), Na (Nael), M (Mlul) , R (EcoRi) , A 
(Agel) and No (Notl) . The chemically synthesized DNA 
20 fragments which served as PCR templates are shown below 
the gpi20 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 
transient transfection assays used to measure gpi20 
25 expression. Gel electrophoresis of immunoprecipitated 

su P«r natants of 293T cells tran S f« e fr»* r i acmid3 

expressing gpi20 encoded by the IIIB isolate of HIV-i 
(g P 120IIlb) , by the MN isolate (gpl20mn) , by the MN 
isolate modified by substitution of the endogenous leader 
30 peptide with that of the CD 5 antigen (gpl20mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CDSLeader (syngpi20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hours 
post-transfection and immunoprecipitated with CD4:igGi 
3 5 fusion protein and prot in A sepharose. 
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DNA expr ssion vectors includ mammalian plasmids and 
viruses. \ 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
5 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 
10 r~~^"~ n construct i n 9 t * le synthetic genes of the 

\ invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. , 

The codon bias present in the HIV gpl20 envelope 
gene is also present in the gag and pol proteins. Thus, 
15 replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 
genes encoding Factor VIII and Factor IX are non- 
20 preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. Conversely, it may be desirable 
to replace preferred codons in a naturally occurring gene 

2 5 with less-preferred codons as a means of lowering 

expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson, 
J.D. et al.. Molecular Biology of the Gene , Volumes I and 
30 II, the Benjamin/ Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Molecular Cell Biology . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W., t al.. 
Principle s of Gene Manipulation: An IntCQflUgtion tg 

3 5 Genetic Engineering . 2d edition, University of California 
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In a preferred embodiment the protein is a 
retroviral protein,' m > more preferred embodiment the 
protein is a lent i viral protein 1 m ,„ ooainent the 
_ pxro^ean. m an even more 

P re f erred embodiment the „' *' i 

, wwj-uicui. cne protein is an HIV orotnin T « 

or W160 . : In other pref . .. i ^ odl J^ P tli : 
protein is a human protein. , 

The invention also features a method for preparino 
. a synthetic gene encoding a protein normally expressed by 
10 mammalian cells. ; T he method includes identifying non \ * 
preferred and less-preferred codons in the natural gene 
encoding the protein and replacing one or more of the 
non-preferred and less-preferred codons with a preferred 
codon encoding the same amino acid as the replaced co^n. 

Under some circumstances (e.g., to permit 
introduction of a restriction site, it may be desirable 
to replace a non-preferred codon with a less preferred 
codon rather than a preferred codon. 

It is not necessary to replace all less preferred 

exnrlsir ef€rred ^ P " fe " ed Increased 

expression can be accomplished even with partial 
replacement. 

in other preferred embodiments the invention 
features vectors (including expression vectors, 
comprising the synthetic gene. 

By " v *gtor» is meant a DNA molecule, derived 

e.g., from a plasmia, bacteriophage, or mammalian or 
insect virus, into which fragments of dna may be inserted 
or cloned, a vector will contain one or more unique 
restriction sites and may be capable of autonomous 
replication in a defined host or vehicle organism such 
that the cloned sequence is reproducible. Thus, by 
"expression vector" is meant any autonomous element 
capable of directing the synthesis of a protein. Such 
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i By protein n rmally expressed in mammalian c lis 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian 1 genome such as Factor VIII, Factor IX, ' 
5 interleukins, and other proteins* The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
which are expressed in mammalian cells post-infection 

10 In, preferred embodiments, the synthetic gene is 

capable of expressing said mammalian protein at a level 
which is at least 110%, 150% , 200%, 500%, 1,000%, or 
10,000% of that expressed by, said natural gene in an In 
vitro mammalian cell culture system under identical 

15 conditions (i.e., same cell type, same culture 
conditions, same expression vector). 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
natural gene are described below. Other suitable 

20 expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 
synthetic and natural genes are described below and in 

25 the standard reference works described below. By 

"expression" is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 
well known to those skilled in the art. By "natural 

3 0 gene" is meant the gene sequence which naturally encodes 
the protein. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in th 
natural gene are non-preferred c dons. 
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OVEREXPRBPPTQN OF MAMMAIiTam awn yy p AT. PPrvr P r H ff 
Field of the T^ t1ffn 
The invention concerns genes and methods for 
5 expressing eukaryotic and viral proteins at high levels 
in eukaryotic cells. , 

Background of T nyrntiT7n 
Expression of eukaryotic gene products in 
prokaryotes is sometimes limited by the presence of 
10 codons that are infrequently used in E. coli. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons overrepresented in 
highly expressed prokaryotic genes (Robinson et al. 
1984). It is commonly supposed that rare codons cause 
15 pausing of the ribospme, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
of transcription and translation. The mRNA 3' end of the 
stalled ribosome is exposed to cellular ribonucleases, 
which decreases the stability of the transcript. 
20 Summary of th<> T™, Pnr1?n 

The invention features a synthetic gene encoding a 
protein normally expressed in mammalian cells wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the mammalian protein has been 
25 replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gec) ; Arg (cgc) ; Asn 
(aac); Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His 
(cac); He (ate); Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe 
30 (ttc) ; Ser (age) ; Thr (ace) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg) ; He (att) ; Leu 
(etc) ; ser (tec) ; Val (gtc) . All codons which do not fit 
the description of pr ferred codons or 1 ss preferred 
codons are non-preferr d codons. 
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